# Wikipedia Pretraining

Nusabert Base
Apache-2.0
NusaBERT Base Version is a multilingual encoder language model based on the BERT architecture, supporting 13 Indonesian regional languages and pretrained on multiple open-source corpora.
Large Language Model Transformers Other
N
LazarusNLP
68
3
Multilingual Albert Base Cased 64k
Apache-2.0
A multilingual ALBERT model pretrained with masked language modeling (MLM) objective, supporting 64k vocabulary size and case sensitivity
Large Language Model Transformers Supports Multiple Languages
M
cservan
52
1
Luke Japanese Wordpiece Base
Apache-2.0
A LUKE model improved from Japanese BERT, specifically optimized for Japanese named entity recognition tasks
Sequence Labeling Transformers Japanese
L
uzabase
16
4
Deberta V2 Base Japanese
A Japanese DeBERTa V2 base model pretrained on Japanese Wikipedia, CC-100, and OSCAR corpora, suitable for masked language modeling and downstream task fine-tuning.
Large Language Model Transformers Japanese
D
ku-nlp
38.93k
29
Roberta Base Japanese With Auto Jumanpp
A Japanese pretrained model based on RoBERTa architecture, supporting automatic Juman++ tokenization, suitable for Japanese natural language processing tasks.
Large Language Model Transformers Japanese
R
nlp-waseda
536
8
Deberta Base Japanese Wikipedia
DeBERTa(V2) model pretrained on Japanese Wikipedia and Aozora Bunko texts, suitable for Japanese text processing tasks
Large Language Model Transformers Japanese
D
KoichiYasuoka
32
2
Albert Base Japanese V1 With Japanese Tokenizer
MIT
This is a Japanese-pretrained ALBERT model that uses BertJapaneseTokenizer as its tokenizer, making Japanese text processing more convenient.
Large Language Model Transformers Japanese
A
ken11
44
3
Mluke Base Lite
Apache-2.0
mLUKE is a multilingual extension of LUKE, supporting text processing tasks in 24 languages
Large Language Model Transformers Supports Multiple Languages
M
studio-ousia
153
2
Bert Base Japanese Char
A BERT model pretrained on Japanese text using character-level tokenization, suitable for Japanese natural language processing tasks.
Large Language Model Japanese
B
tohoku-nlp
116.10k
8
Tiny Roberta Indonesia
MIT
This is a small RoBERTa model based on Indonesian language, specifically optimized for Indonesian text processing tasks.
Large Language Model Transformers Other
T
akahana
17
1
Mluke Base
Apache-2.0
mLUKE is a multilingual extension of LUKE, supporting named entity recognition, relation classification, and question answering tasks in 24 languages.
Large Language Model Transformers Supports Multiple Languages
M
studio-ousia
64
6
Bert Base Multilingual Cased Finetuned Polish Squad1
A Polish Q&A system fine-tuned on the multilingual BERT model, performing excellently on the Polish SQuAD1.1 dataset
Question Answering System Other
B
henryk
86
4
Bert Base Japanese
A BERT model pretrained on Japanese Wikipedia text, using IPA dictionary for word-level tokenization, suitable for Japanese natural language processing tasks.
Large Language Model Japanese
B
tohoku-nlp
153.44k
38
Bert Base Japanese Whole Word Masking
BERT model pretrained on Japanese text using IPA dictionary tokenization and whole word masking techniques
Large Language Model Japanese
B
tohoku-nlp
113.33k
65
Bert Large Japanese Char
BERT model pretrained on Japanese Wikipedia, employing character-level tokenization and whole word masking strategy, suitable for Japanese natural language processing tasks
Large Language Model Japanese
B
tohoku-nlp
24
4
Bert Base Japanese V2
BERT model pretrained on Japanese Wikipedia using Unidic dictionary for word-level tokenization and whole word masking
Large Language Model Japanese
B
tohoku-nlp
12.59k
26
Bert Large Japanese
BERT large model pretrained on Japanese Wikipedia, utilizing Unidic dictionary tokenization and whole word masking strategy
Large Language Model Japanese
B
tohoku-nlp
1,272
9
Mluke Large
Apache-2.0
mLUKE is the multilingual extension of LUKE, supporting named entity recognition, relation classification, and question answering tasks in 24 languages.
Large Language Model Transformers Supports Multiple Languages
M
studio-ousia
70
2
Bert Base En Hi Cased
Apache-2.0
A lightweight customized version based on bert-base-multilingual-cased, supporting English and Hindi while maintaining original model accuracy
Large Language Model Other
B
Geotrend
15
0
Bert Base Ja Cased
Apache-2.0
A customized Japanese slim version based on bert-base-multilingual-cased, maintaining original accuracy
Large Language Model Japanese
B
Geotrend
13
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase